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Time-varying networks describe a wide array of systems whose constituents and interactions evolve over 
time. They are defined by an ordered stream of interactions between nodes, yet they are often represented in 
terms of a sequence of static networks, each aggregating all edges and nodes present in a time interval of size 
\t In this work we quantify the impact of an arbitrary At on the description of a dynamical process taking 
place upon a time- varying network. We focus on the elementary random walk, and put forth a simple 
mathematical framework that well describes the behavior observed on real datasets. The analytical 
description of the bias introduced by time integrating techniques represents a step forward in the correct 
characterization of dynamical processes on time-varying graphs. 

Time-varying networks are ubiquitous. Examples are found in the social, cognitive, technological and eco- 
logical domains as well as in many others \ The temporal nature of such systems has a deep influence on 
dynamical processes occurring on top of them^"^\ Indeed, the spreading of sexual transmitted diseases, the 
diffusion of topics over social networks, and the propagation of ideas in scientific environments are affected by 
duration, sequence, and concurrency of contacts^'^'^^"^^'^^'^^. In all these cases the timescale characterizing the 
evolution of the network is comparable with the timescale ruling the unfolding of the process, and they cannot be 
decoupled. However, empirical datasets are often reduced to a series of static networks by introducing a time- 
integrating window, At^'^^"^^. This is the case, for instance, of face-to-face interaction networks^^, for which the 
fine-grained temporal resolution of (e.g.) phone call networks is not available, or of infants' semantic networks^^, 
whose evolution can be studied only through the analysis of few snapshots^". In other instances, a time window is 
introduced to reduce the amount of stored information, or to simplify the application of mathematical frame- 
works developed for static or annealed systems. This is the case, for example, of online social networks where, 
although usually the original information has time resolutions down to the second, the available datasets are 
integrated over different windows of hours, days, months, or even years. Thus, the introduction of an integrating 
window is either intrinsic to the system under study or dictated by practical reasons. 

In this work we address the impact of an arbitrary At on the description of a discrete dynamical process taking 
place upon a time-varying network. Despite recent results showing that the presence of any level of temporal 
aggregation may affect the correct characterization of dynamical processes evolving on top of such datasets^"^\ an 
analytical formalization, characterization, and understanding of these effects for a general At is still missing. 

In particular, we focus on the prototypical random walk process evolving on time-varying networks integrated 
over a general time window At. First, we clarify the relevance of the integrating window issue by studying the 
behavior of random walk processes on real time-varying networks as a function of At. Then, we introduce a 
mathematical framework that well describes the observed behavior on synthetic activity driven networks^^ as well 
as on two different real datasets. 

Results 

We aim to understand how At affects the behavior of dynamical processes taking place on time-varying networks. 
To this end, we consider the fundamental random walk (RW) process on two different real time-varying networks 
in which the links have been integrated over different integrating windows At (see Fig. 1). Typically, the RW 
asymptotic occupation probability p (see Methods for the formal definition) is computed grouping the nodes 
according to their the degree k^^~^^. The quantity pk is then defined as the average asymptotic occupation 
probability of a node in the degree class k^^~^^. However, in time-varying networks the degree of a node is not 
univocally defined and, more importantly, is a function of At. For example, the degree might be the number of 
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Figure 1 | Example of time integration on time-varying networks. The 

random walker is located on the colored node, and can travel on the links 
depicted as continuous line, while At defines the integration window. 
Dashed lines represent links that are present in the system, but are out of 
reach for the walker. 

connections integrated over the time window, or the average number 
of connections across the T/At static frames (where Tis the total time 
span of the data). Thus, the same node could contribute to different 
degree classes depending on the value of At. We, therefore, focus on a 
different node measure that has been shown to be mostly invariant to 
At, namely the activity rate a of a node^^. The activity rate a is defined 
as the average rate at which each node interacts with others during 
the observation period [0, 7], and can be interpreted as the intrinsic 
attitude of each node to engage in interactions with other nodes. We 
aim to calculate the occupation probability as a function of a. 

In our simulations we consider two real time-varying networks, 
and investigate the RW occupation probability function of activity 
rate a and the integrating window At: Pa(At). The first dataset is the 
co-authorship network of the Physical Review Letters (PRL) journal 
from 1980 to 2006^^. The second dataset is the Yahoo! music dataset 
with —4.6 X 10^ songs rated by ~2 X 10^ Yahoo! users over six 
months^^. We run the RW process over these two time-varying net- 
works for different values of At, and record the occupation probabil- 
ity over multiple runs (see SI for details). Fig. 2 shows the empirical 
values of PaiAt) (solid points) observed in the PRL dataset for four 
distinct values of Af = {1,10, 60, 182} days. Error bars represent the 
the standard deviation obtained from distinct simulation runs start- 
ing at times to e {0,1, At — 1} from the beginning of the dataset. 

The effect of At is dramatic. Over large values of At the RW 
behaves roughly as could be expected. The share of random walkers 
increases with the node activity, i.e., highly active nodes are collect 
more walkers at the end of the simulation than nodes with low 
activity. However, as Af decreases, more active nodes lose their power 
to attract walkers and the occupation probability becomes more 
uniform. A similar scenario is observed over the Yahoo! dataset over 
four values of At, namely one second, one hour, six hours, and one 
day (points in Fig. 3). In the next section we will see that the reason 
for this behavior rests solely in the probability that the RW sees no 
edges when it decides to move, which turns out to be a function of 
three factors: At, the activity of node the walker resides, and the 
average node activity in the system. 

Mathematical formulation. Let us consider a random walker diffus- 
ing at discrete time steps At over a time-varying network charac- 
terized by N nodes. Starting at node V (t) at step t, the walker takes 
step t + 1 at time {t + 1) At diffusing over a network Gt{At), where 
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Figure 2 | Occupation probability /^^ of a RW at the end of the simulation 
as a function of node activity. The points are the values of Pa of a RW over 
the Physics Review Letters time- varying co-authorship network from 1980 
to 2006 for different integrating windows Af G {1, 10, 60, 182} days. The 
error bars are evaluated starting the process at different days from the 
beginning of the dataset. 

GtiAt) is the result of the union of all the edges generated in the 
interval [tAt, (t + 1) At). We focus on the general case of an 
arbitrary time aggregation window Af > 0. 

We consider a simple class of time-varying networks called activity 
driven networks^^. The crucial ingredients of these models are: dF(a), 
the fraction of nodes with activity rate a, and m, the number of edges 
that are simultaneously created by a node (see Methods for further 
details). The activity rate determines the probability per unit time for 
a node to establish (m, simultaneously) edges to other nodes in the 
system. The value of parameter m is dictated by the specific system 
under consideration. The case m > 1 is appropriate to describe one- 
to-many interactions, found for example in such systems as Twitter 
and blog networks^^'^^. On the other hand, m = 1 describes two-party 
(dyadic) communications that are characteristic of phone-call and 
text-message networks^^'^^. At each step ^ = 0, 1, ... an unweighted 
network Gt{At) is generated as follows: 

a) Gt(At) starts with N disconnected nodes; 

b) The the number of times a node with activity a is active during 
interval At, KAt,a^ is Poisson distributed 
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Figure 3 | Occupation probability /?a of a RW at the end of the simulation 
as a function of node activity. Points represent the Pa values of a RW over 
the time-varying graph of Yahoo! song ratings for different integrating 
windows At of one second, one hour, six hours, and one day. The standard 
deviations are too small to be shown in the plots. 
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^ is the probabiHty that no edge is created at a 



exp( — aAt). 



Node generates mK^t,a undirected edges connected to mK^t,a 
randomly selected nodes (without replacement or self-loops). 
Inactive nodes in this observed period of At may receive con- 
nections from other active vertices; 
c) At time (t+ l)At the process starts over from step a) to generate 
network Gt+i{At). 

Although activity driven networks are Markovian (memoryless) 
and lack of some properties observed in real temporal systems, they 
can be considered as the simplest yet nontrivial framework to study 
the concurrence of changes in connectivity pattern of the network 
and dynamical processes unfolding on their structure^^'^^. 

To describe the RW behavior, we need to evaluate the transition 
probability that a walker starting at a node with activity a' moves to a 
node with activity a at the next At time step, Qa\a'{At). Without loss 
of generality in what follows we focus on the case m = 1. Detailed 
results for the m > 1 one-to-many interactions are discussed in the 
Supplementary Information. At step t + 1 the neighbors ofV(t) can 
be classified into two types: 

1. Passive destinations, are neighbors of V {t) connected by edges 
created due to the activity of V (t) itself. They are randomly 
selected from the graph and thus their activity is distributed 
according to dF(a). We define K^tAit) to be the number of such 
passive destinations, where A(t) is the activity rate of node V(t). 

2. Active destinations, are neighbors of V (t) connected to V (t) by 
edges created due to their own activity. Thus, their activity is 
distributed as ad¥(a)/{a), where (a) is the average activity rate in 
the system. We define define H/^t as the number of such active 
destinations. 

The word destinations highlights the fact that the walker moves 
from V {t) to one of these KAt,a' + neighbors of V (t). For 
sufficiently large N, H^t and K^^^^' both Poisson distributed with 
average (a) At and a' At, respectively. If V (t) has at least one edge, the 
walker follows the edge of a passive destination with probability 
K^t,a'KKAt,a' + H^t), while it moves towards an active destination 
with probability HAt/{KAt,a'~^ ^m)- Unconditioning the latter 
expressions with respect to the values of iC^f^a' ^nd we obtain 



{a'AtfUa)Atf ^adF(a) Ua)Atf 

" — TTx 17\ Vo[a — a ^ 



k\h\ 

n = 

exp{-{a' + {a))At), 



- (a) h\ 



where d{x) is the Dirac delta function. While we refer the reader to 
the SI for the detailed derivation, each term in eq. (1) has a simple 
interpretation. The two terms inside the double sum represent, 
respectively, the probability that the walker moves to a passive des- 
tination that has activity a and the probability that the walker moves 
to an active destination that has activity a. The terms multiplying the 
two terms inside the double summation are related to the probability 
that Ki^t,a' = k and H^t = h. The d{a — a') term considers the 
probability that the node has no edges after At and thus the walker 
must remain at V (t). 

Thankfully, eq. (1) can be simplified (see SI) yielding 



a' + a 
a' + {a) 



dF(a)(l-C„.,A()+'5(«'-«)C'.Af, (2) 



node with activity a' during interval At. Note that in eq. (2) the 
parameter At only affects the probability that no edge is created until 
the next time step. 

To find the RW stationary distribution we first note that the RW 
on the time- varying network is stationary and ergodic (see SI). Thus, 
the RW occupation probability p^, defined as the probability of find- 
ing the walker in a given node of activity a, exists and is unique^°. The 
value of Pa is the fixed point solution of the following Chapman- 
Kolmorogov set of equations^^ 



NdV[a))a'^Q 



(3) 



where Q is the set of all activity rates in the system. The solution to eq. 
(3) can be obtained numerically. Interestingly, we can extend eq. (3) 
to consider lazy random walks where the walker moves with prob- 
abiHty p e (0, 1] or does not move with probability I — p. For the 
lazy walker we just need to replace Qa\a'{At) in eq. (3) with Qa\a'{At)p 
+ d{a' — a)(l — p). A simple algebraic manipulation shows that Pa 
does not change with p. Hence, the steady state of the lazy walker for 
anyp e (0, 1) is the same as the walker that moves with probability 

p = 1. 

We also find that closed-form solutions of eq. (3) exist in the limits 
of At:^l and A^^l. In the At>l case, links are integrated over a 
large time window and the time-varying network can be considered 
static. Recall that CaAt = e^'^^'^^^'. For A^>1 the value of CaAt ^ 0' 
e Q, and thus the second term of eq. (2) is close to zero. In this 
scenario Qa\a'(At) = C(a + (a))dF(a), where C = l/2(a) yielding the 
fixed point solution of eq. (3) 



Pa' 



2N{a) 



(4) 



The asymptotic occupation probability of a given node of class a is 
simply proportional to its activity. Since in the regime of large At the 
degree of a node v, ky, is proportional to its activity, ay, eq. (4) yields 
Pa^ccky. Thus, for sufficiently large At, we recover the well-known 
behavior of static networks, where the occupation probability of a 
node is proportional to its degree^ \ Furthermore, in the SI we show 
that eqs. (2), (3), and (4) hold for weighted aggregation procedures 
where integrated edges have weights proportional to how often they 
appeared during an interval At. 

In the regime of very short aggregating windows we have limA^^o 
CaAt ^ 1, V^^ ^ Thus, the first term of eq. (2) is zero yielding 
Qa\a'(At) = d¥(a) and the trivial fixed point solution of eq. (3) 



Pa' 



1 



(5) 



Thus, the walker is equally likely to be found at any node regardless of 
its activity rate. In fact, when At is small the probability a node has 
more than one edge is close to zero. Consequently, highly active 
nodes lose and gain walkers at the same rate, giving rise to homo- 
geneous occupation probabilities in eq. (5). Interestingly, in previous 
work on general time-varying network processes we show that the 
result in eq. (5) holds even when aggregated snapshots have arbitrary 
strong spatio-temporal correlations^". 

Numerical validation on synthetic networks. We validated our 
analytical results through extensive numerical simulations. We 
considered networks with N = 10^ nodes and a power-law activity 
distribution dF{a) oc a"^ (as observed in many real networks^^), 
restricted to the interval Q = [10"^ 1] to avoid divergencies in the 
limit a^l. As shown in Fig. 4, the exact solution reproduces the 
simulations accurately for the entire spectrum of integrating 
windows At (case m = 1 in main panel). Interestingly, as At grows, 
the occupation probability increases sharply in high-activity vertices 
while slightly decreasing at low activity nodes. Moreover, as At 
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Figure 4 | Occupation probability of a RW over an activity-driven 
network with activity distribution c/F( a) oc a~^a e (10~^1),N= 10^for 
different values of m. Curves in the main plot concern the m = 1 case, 
where each node can only simultaneously connect to one node. In the inset, 
the case m = 6 is considered, where a node simultaneously connect to six 
other nodes. Solid curves represent the analytical prediction of eq. (3) 
integrated over At = 1, 10, 100 (diamonds, squares and circles) time 
windows. Note that in both panels as At gets larger ~ a. Averages 
performed over 10^ independent simulations. 

increases Pa (x a diS predicted by eq. (4), while as At gets smaller, p^ = 
1/N, as predicted by eq. (5). The equations describe correctly also the 
behavior observed for one-to-many simultaneous connections m, 
characterized by a smoother increase in p^ at high activity nodes 
(see m = 6 case in Fig. 4, inset). The SI contains more details on 
the formulation of the m > 1 case. 

Numerical validation on real- world networks. The analytical frame- 
work discussed above qualitatively reproduce also the behavior 
observed in real datasets. In Figs. 5 and 6 the solid lines show the 
numerical solution obtained by applying eq. (2) into eq. (3) (see SI), 
for the PRL and Yahoo! datasets, respectively. The gray points in 
Figs. 5 and 6 reproduce the simulation results already shown in 
Figs. 2 and 3, respectively. All numerical solutions use the same 
activity distribution dF(a), extracted from the time-varying graph of 
At = 1 day for the PRL dataset and At = 1 second for the Yahoo! 




10"^ 10"^ 10"^ 



a 

Figure 5 | Occupation probability p^ of a RW at the end of the simulation 
as a function of node activity. The points are the values of of a RW over 
the Physics Review Letters time-varying co-authorship network from 1980 
to 2006 for different integrating windows At G {1, 10, 60, 182} days. The 
solid curves show the respective numerical solutions of eq. (3) and the 
black curve shows eq. (4). 
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Figure 6 | Occupation probability p^ of a RW at the end of the simulation 
as a function of node activity. The points are the values of of a RW over 
the time-varying graph of Yahoo! song ratings for different integrating 
windows At of one second, one hour, six hours, and one day. The solid 
curves show the respective numerical solutions of eq. (3) and the black 
curve shows eq. (4). 

dataset (dF(a) extracted from larger values of At provide similar 
results^^, see SI for details). 

The theoretical results accurately describe real data, with some 
deviations for nodes in the intermediate activity range at At of one 
day. The RW occupation probability is uniform and independent of 
node activity for small At as predicted by eq. (5). As predicted by eq. 
(4), the RW occupation probability Pa approaches (a + {a))/{2N{a)) 
(black curve) as At increases, an effect particularly noticeable for 
high- activity nodes. It is also worth highlighting that the data 
matches well the theoretical equations for the case m = 1, suggesting 
a connection between the datasets and the fundamental mechanisms 
described in our model (for the similarity in behavior between m = 1 
and projected networks such as the PRL co-authorship networks see 
SI). 

Discussion 

Our results clarify the effect of time aggregation procedures on the 
behavior of the RW, taken as the simplest instance of dynamical 
process, even when aggregation windows are "short". We have quan- 
tified this effect in a rigorous mathematical framework that (i) allows 
us to recover the results concerning static networks in the limit of 
infinite aggregation windows, (ii) accurately describes the behavior 
observed in numerical simulations upon synthetic time-varying net- 
works, and (iii) captures the phenomenology observed on real data- 
sets. Overall, while for practical or technical reasons researchers are 
often forced, or simply tempted, to work with time aggregated repre- 
sentations of time-varying networks, our work suggests that caution 
should be used when drawing general conclusions about dynamical 
processes based upon time-aggregated networks. At the same time, 
moreover, our theoretical results may help to investigate possible 
distortions introduced by the aggregating windows of data collection 
methods. 

The proposed framework considers inherently discrete processes, 
such as spreading phenomena in contact networks that are, also at 
the smallest time resolution possible, discrete. We leave the general- 
ization to continuous processes for further work. 

Methods 

Occupation probability. The asymptotic occupation probability is the steady state 
probability of finding the walker in a node with activity a, which is guaranteed to exist 
and be unique if the time-varying network that is stationary, ergodic, and T- 
connected (see SI), such as in activity driven networks. A time-varying network is T- 
connected if there is a temporal path between any two nodes*". In our simulations we 
consider the RW occupation probability pa to be the probability of finding the walker 
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in a node with activity a at the end of the simulation period [0, T], given the walker 
starts at a random node. 

Activity-driven networks. Activity-driven network models are based on the activity 
patterns of nodes, that are used to explicitly model the evolution of the network 
structure over time^^. It can be shown that the full dynamics of the network are 
encoded in the activity rate distribution, dF(a) and that the time-aggregated 
measurement of network connectivity yields a degree distribution that follows the 
same functional form as the distribution dF(a) in the limit of small A;/At and /c/AT^^. 
This is an important feature of the model, that is able to reproduce basic statistical 
properties found in many real networks giving a simple prescription to characterize 
explicitly dynamical connectivity patterns. 

Datasets & simulation. In this study we considered two different empirical 
projections of bipartite time-varying networks. The collaborations in the journal 
Physical Review Letters (PRL) published by the American Physical Society^*, and the 
Yahoo! music dataset made available by Yahoo !^^. 

PRL dataset. The bipartite network representation of this dataset has two type of 
nodes: authors and papers. An author is connected to all the papers she/he wrote in a 
integrating window Af. We study the bipartite projection of the authors. In this 
representation each author of an article in PRL as a node. Undirected edges connect 
authors that collaborate in the same article. We focus just on small collaborations 
filtering out all the articles with more than 10 authors. We consider the period 
between 1958 and 2006. The datasets contains 80,554 authors and 66,892 articles. The 
smallest timescale available is one day. 

Yahoo! music dataset. In this dataset the bipartite network has two type of nodes: users 
and songs. We study the bipartite projection over the songs. Each node is a song and 
two songs are connected if at least one user rated both in a time window At. The 
dataset contains 4.6 X 10^ songs rated by 199,719 users of Yahoo! users collected in 
the course of six months^^. User activity is recorded at a time resolution of seconds. 

Simulation setup. We obtain the empirical walker occupation probability, pa, as 
follows. Construct the transition probability matrix Pf associated to the RW on the t- 
th aggregated network Gf(Af), t = 0, T/At, where T is the time of the last event in 
the dataset. The empirical RW occupation probability is obtained by multiplying the 
matrices Pq Pi ■ ■ ■ Pn and then left-multiplying the result by the vector ( 1/N, . . ., l/N), 
which gives equal probability that for the walker to start at any node. We note in 
passing that similar results are obtained when the walker starts at a handful of high 
activity nodes. 
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